batch19 
QC REPORT 
Input files downloaded from:
 /nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/mbrave_batch_data/batch19/ 
Output files are saved to:
 /nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch19/ 

The consensus network .tsv file exists: TRUE 
The fasta file exists: TRUE 
The stample statistics file exists: TRUE 
The negative control statistics file exists: TRUE 
The positive control statistics file exists: TRUE

Statistics for the positive controls

Total number of positive controls: 96 
Number of positive controls per plate: 1 

All plates have positive controls: TRUE  
Total number of reads in positive controls: 27813 
Maximum number of reads: 469 in positive control sample: CONTROL_POS_CAMP_013_G12 
Minimum number of reads: 90 in CONTROL_POS_YARN_031_G12 

Average number of positive control reads: 289.71875 
Median number of positive control reads: 296.5 
Read standard deviation: 71.4152723051663 

Quantiles:
 5%: 134.75
10%: 213
25%: 249.5
50%: 296.5
75%: 329.5
95%: 392.25
100%: 469

Blue solid line: read mean
Orange dotted lines: 5% and 10% lower quantiles

Number of positive control samples in the lower 5% quantile: 5 

 CONTROL_POS_FACE_209_G12
CONTROL_POS_FACE_232_G12
CONTROL_POS_PBRI_001_G12
CONTROL_POS_PBRI_004_G12
CONTROL_POS_YARN_031_G12 

Names of the associated partners: BIFOR, PBRI, YARN

Statistics for the negative controls

Total number of negative controls: 500 
Total number of lysate negative controls: 417 
Total number of empty negative controls: 83 

Number of negative controls per plate:
Number of negative controls per plate Number of plates
1 7
2 76
more than 2 13

All plates have negative controls: TRUE 

Total number of reads in lysate negative controls: 2096 
Total number of reads in empty negative controls: 62 
Maximum number of reads: 230 in lysate negative control sample: CONTROL_NEG_LYSATE_CAMP_017_H12 
Maximum number of reads: 6 in empty negative control sample: CONTROL_NEG_WWTS_005_A8 

Zero reads in: 286 negative control samples 
In lysate controls: 240 
In empty controls: 46 

Average number of negative control reads: 4.316 
In lysate controls: 5.02637889688249 
In empty controls: 0.746987951807229 

Median number of negative control reads: 0 
In lysate controls: 0 
In empty controls: 0 

Skewness number of negative control reads: 9.3101349995259 
In lysate controls: 8.52792548498113 
In empty controls: 2.17703718332309 

Quantiles in lysate controls:
 5%: 0
10%: 0
25%: 0
50%: 0
75%: 3
95%: 30
98%: 37.68 

Quantiles in empty controls:
 5%: 0
10%: 0
25%: 0
50%: 0
75%: 1
95%: 2
98%: 3.72

Blue solid line: read mean
Orange dotted lines: upper 5% and 2% of samples with the highers number of reads

Number of negative control samples in the higher 5%: 26 
 Out of in the lysate controls: 26 
 Out of in the empty controls: 0 
 
Number of negative control samples in the higher 2%: 10 
 Out of in the lysate controls: 10 
 Out of in the empty controls: 0 

Names of the associated partners: CAMP, BIFOR, NENM, PBRI, WWTS

Statistics for the samples

Number of samples in the batch (exclusing controls): 8620 
Total number of partner plates: 96 
Total number of sample reads: 2813154 

Maximum number of sample reads: 803 in sample: FACE_220_D9 
Minimum number of sample reads: 0 in 336 samples
 which is 3.89791183294664 % of all samples 

Average number of reads: 326.351972157773 
Median number of reads: 362 
Read standard deviation: 152.439735911066 
Skewness number of sample reads: -0.660035204264651 

Quantiles:
 5%: 1
10%: 68.9000000000001
25%: 240
50%: 362
75%: 435
95%: 523
100%: 803

Blue solid line: read mean
Orange dotted lines: lower 5% and 10% of samples

Number of samples in the lower 10%: 862 out of 8620 samples 
Number of samples in the lower 5%: 470 out of 8620 samples 

Partners associated with the bottom 5% of samples by read count:
Partner names Frequency
BIFOR 170
NENM 102
CAMP 61
YARN 51
WWTS 42
PBRI 40
SNST 4

Number of samples with 0 reads: 336

Plate boxplots

Plates where the 75th percentile of the data is lower than expected mean read count (dark grey):

 CAMP_039
FACE_209
FACE_227
PBRI_005
SNST_005
SNST_007
YARN_026
YARN_031 
 
which constitutes 8.33333333333333 % of all partner plates in this batch

Grey line: median
Brown line: mean
Green data points: positive controls
Blue data points: empty negative controls
Navy data points: lysate negative controls

Plates where the 75th percentile of the data is lower than expected mean read count (dark grey):  
How many samples from the low-performance partner plates are present in the low-performance UMI plates (purple data points): 0 %

Assess the positive controls with the low number of reads detected in the previous steps:

FACE_209 More reads in positive control than in samples on average.
 Observed number of reads: 113 Expected: 105.440860215054 

FACE_232 Positive control failed.
 Observed number of reads: 133 Expected: 225.161290322581 

PBRI_001 Positive control failed.
 Observed number of reads: 131 Expected: 308.591397849462 

PBRI_004 Positive control failed.
 Observed number of reads: 134 Expected: 247.430107526882 

YARN_031 Positive control failed.
 Observed number of reads: 90 Expected: 214.537634408602 
YARN_031 
FACE_209 
The above plates have lower than expected number of reads 
AND failed positive controls. 
THESE PLATES NEED TO BE EXAMINED FURTHER

Low-quality plates are displayed here. All the other plates are plotted in the last part of this report.
Green squares: controls [any kind]

### Assessment of sequence conflicts and contaminants

Positive control as contamination source

NOTE: All sample and sequence IDs match - data successfully merged
Positive control OTU is TAX:1287025 

Non-positve control samples that contain positive control reads:
Sample Control Sequence Count Sequence Similarity Sequence Type UMI Plate ID
FACE_240_B3 1 99.23195 secondary 8
Number of samples with positive control OTU as primary sequence: NA 
Number of samples with positive control OTU as secondary sequence: 1 
out of 4842 samples with secondary sequences 

Location of the contaminants relative to the source:

Orange square: positive contros
Green squares: samples with positive control contamination

Read count mean of all secondary sequences in all samples: 5.10170152601043 
Read count mean of all positive control sequences in other samples: 1 

Read count median of all secondary sequences in all samples: 1 
Read count median of all positive control sequences in other samples: 1

Blue solid line: secondary hit read mean
Orange dotted lines: mean of reads found as secondary contaminants from the positive controls in other samples
Both lines should be in close proximity meaning that the secondary contamination from positive controls is comparable to the potential contamination in other samples.

There are no samples that contain positive control reads as the primary hit
NO SAMPLES TO BE REMOVED
NOTE: the above samples are automatically removed if:
  • There’s only one primary read
  • Secondary sequence found in the same sample is not an Arthropod


  • Negative control contamination

    Distribution of reads in negative controls

    NOTE: contamination source can be either primary or secondary sequence within samples!
    Family No. Source Samples
    Chironomidae 64
    Entomobryidae 34
    Hominidae 7
    Sciaridae 7
    Agromyzidae 4
    Coccinellidae 3
    None 2
    Psychodidae 2

    Outline: negative controls with contaminants
    Colour of the oultine indicates partners to track the samples between partner and UMI plates.
    Thicker chartreuse outline: FAILED negative controls with contaminants [2%]
    Numbers indicate the read count
    Squares that are not outlined represent potential sources of contamination within plates: identical sequences found within these wells and negative controls.

    Assessment of primary and secondary sequences

    NOTE: Controls are not included! 
    
    Number of wells with a primary sequence only: 3430 
    Number of wells with primary and secondary sequences: 4793 
    
    Number of primary chimeric sequences: 28 
    Number of secondary chimeric sequences: 5496 
    
    NOTE: All secondary chimeric sequences successfully removed
    [1] 2
    Number of samples with only primary chimeric sequence recognised: 12 
    We do not know how mBRAVE recognises chimeras - for now ony samples represented by less than 5 reads get removed
    Retained samples: 2
    Number of EXCLUDED primary sequences: 469 
     which constitutes 5.71045902836966 % of all samples 
    These samples are not being removed - it's an mBRAVE cut-off 
    
    Number of primary sequences with no taxonomy assigned: 110 
     which constitutes 1.33934007061975 % of all samples 
    These samples are going to be examined further

    Number of samples with no taxonomy assigned that will be replaced with the secondary sequence based on the sequence similarity: 19 
    Other sequences with no taxonomy assigned to the primary sequence will remain unchanged.

    If the first entry is not ‘Arthropod’, then the second entry is likely correct [based on manual observations]

    Number of samples with Wolbachia detected: 421 
    
    Table with plate positions, number of reads, and sequences saved to the output directory:
     /nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch19/
    Number of samples with Nematoda, Tardigrada, Annelida, and/or Rotifera detected: 71 
    
    Table with plate positions, number of reads, and sequences saved to the output directory:
     /nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch19/
    Taxon Frequency
    Chordata 38
    Nematoda 2
    Proteobacteria 64
    Rotifera 1
    105 wells had primary non-Arthropod hits and secondary Arthropod hits 
    NOTE: Primary hits are going to be replaced
     
    Samples with only non-Arthropod sequences detected: 56 
    NOTE: These samples have been excluded!
    Total number of wells with Anopheles reads: 82
    Plate Sequence No. samples with Anopheles reads
    CAMP_014 secondary 1
    CAMP_027 primary 8
    CAMP_027 secondary 1
    CAMP_038 primary 1
    CAMP_038 secondary 2
    CAMP_039 primary 5
    CAMP_039 secondary 6
    CAMP_040 primary 4
    CAMP_040 secondary 2
    CAMP_041 primary 4
    CAMP_041 secondary 4
    CAMP_042 primary 10
    CAMP_043 primary 15
    CAMP_043 secondary 3
    CAMP_044 primary 4
    CAMP_044 secondary 2
    CAMP_045 primary 1
    CAMP_046 primary 3
    CAMP_046 secondary 1
    SNST_007 primary 3
    SNST_007 secondary 1
    WWTS_003 secondary 1
    Partner Plate No. Anopheles primary sequences with 200 + reads
    CAMP 043 15
    CAMP 042 10
    CAMP 027 8
    CAMP 040 4
    CAMP 039 3
    CAMP 044 3
    CAMP 046 3
    CAMP 041 2
    CAMP 038 1
    CAMP 045 1
    Number of primary African Anopheline hits [200 or more reads]: 50 
    NOTE: All primary mosquito samples removed!

    Number of samples with only primary Arthropod sequence: 6290 
     79.309040474089 % of all remaining samples 
    
    Number of samples where secondary sequence is not present elsewhere on the partner or UMI plate: 0 
    Number of conflicting sequences [sequences are in different families or orders, both have good read support]: 130
    Primary hit Number
    Arthropoda 7830
    None 84
    Number of retained samples: 7914 
    Number of Arthropod samples assigned by mBRAVE [this inscludes samples with fewer than 5 reads that have now been excluded!]: 7983 
    Number of samples with replaced sequences: 37 
    Retained chimeras: 11 
    Retained samples with no taxonomy: 84 
    
    Each retreived sample has only one sequence: TRUE
    Number of samples Description Category Decision
    5262 Only one sequence with more than 200 reads, no secondary sequence detected 1 YES
    814 Only one sequence with 50 to 200 reads, no secondary sequence detected 2 YES
    202 Only one sequence with 5 or more but less than 50 reads, no secondary sequence detected 3 YES
    120 Dominant sequence with more than 200 reads, non-conflicting secondary sequences with 5 or less reads 4 YES
    34 Dominant sequence with 50 to 200 reads, non-conflicting secondary sequences with 5 or less reads 5 YES
    759 Dominant sequence with more than 200 reads, conflicting secondary sequences with 5 or less reads 6 YES
    316 Dominant sequence with 50 to 200 reads, conflicting secondary sequences with 5 or less read 7 YES
    170 Dominant sequence with more than 200 reads, secondary sequences with more than 5 read support 8 NO
    140 Dominant sequence with 50 to 200 reads, secondary sequences with more than 5 read support 9 NO
    60 Dominant sequence with 5 or more but less than 50 reads, non-conflicting secondary sequences with less than 5 reads 10 NO
    37 Dominant sequence with more than 5 but less than 50 reads, any other secondary reads present 11 NO
    Decision category Number of samples
    NO 407
    YES 7507
    8.19025522041764 % OF SAMPLES EXCLUDED [all samples]
    12.9118329466357 % OF SAMPLES EXCLUDED [only approved samples]

    Plate heatmaps - retained samples


    NOTE: The heatmaps below show only the retained samples. Controls, chimeric samples, non-Arthropod samples, and samples with no taxonomy assigned have been removed or replaced!

    Final fasta file succesfully saved: /nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch19/BOLD_filtered_sequences_batch19.fasta 
    Final metadata file succesfully saved: /nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch19/BOLDfiltered_metadata_batch19.csv

    The report and output files have been successfully generated!


    Number of retained samples per partner plate
    Plate Original number of samples No. samples post-QC No. confident samples Percentage of confident samples
    PBRI_010 30 30 30 100.00000
    NENM_036 93 92 92 98.92473
    NENM_038 93 92 92 98.92473
    YARN_026 93 92 92 98.92473
    NENM_035 93 91 91 97.84946
    YARN_024 93 91 91 97.84946
    YARN_028 93 91 91 97.84946
    FACE_238 93 90 90 96.77419
    PBRI_002 93 91 90 96.77419
    PBRI_008 93 92 90 96.77419
    SHAP_026 93 92 90 96.77419
    YARN_027 93 92 90 96.77419
    FACE_239 93 91 89 95.69892
    SHAP_025 93 91 89 95.69892
    SNST_004 93 93 89 95.69892
    YARN_025 93 89 89 95.69892
    YARN_029 92 89 88 95.65217
    SNST_001 89 87 85 95.50562
    SNST_003 94 92 89 94.68085
    FACE_229 93 89 88 94.62366
    PBRI_007 93 89 88 94.62366
    SNST_002 92 89 87 94.56522
    SNST_005 92 88 87 94.56522
    CAMP_013 94 94 88 93.61702
    FACE_240 93 88 87 93.54839
    FACE_241 93 90 87 93.54839
    NENM_029 93 87 87 93.54839
    NENM_031 93 87 87 93.54839
    NENM_033 93 87 87 93.54839
    NENM_037 93 88 87 93.54839
    SNST_006 93 92 87 93.54839
    CAMP_048 93 90 86 92.47312
    FACE_237 93 90 86 92.47312
    NENM_028 93 86 86 92.47312
    NENM_030 93 86 86 92.47312
    NENM_032 93 86 86 92.47312
    WWTS_009 93 89 86 92.47312
    CAMP_038 93 88 85 91.39785
    FACE_228 93 87 85 91.39785
    FACE_235 93 85 85 91.39785
    PBRI_006 93 86 85 91.39785
    CAMP_047 93 91 84 90.32258
    FACE_219 93 88 84 90.32258
    NENM_027 93 84 84 90.32258
    WWTS_004 93 90 84 90.32258
    CAMP_014 94 87 84 89.36170
    CAMP_017 94 92 84 89.36170
    CAMP_042 93 83 83 89.24731
    SNST_007 34 30 30 88.23529
    FACE_231 93 88 82 88.17204
    FACE_234 93 85 82 88.17204
    NENM_039 93 82 82 88.17204
    WWTS_005 93 91 82 88.17204
    CAMP_026 74 66 65 87.83784
    CAMP_045 32 29 28 87.50000
    FACE_233 93 86 81 87.09677
    FACE_236 93 88 81 87.09677
    CAMP_027 74 65 64 86.48649
    FACE_222 93 88 80 86.02151
    PBRI_001 93 85 80 86.02151
    PBRI_004 93 82 80 86.02151
    CAMP_015 94 91 80 85.10638
    FACE_225 93 83 79 84.94624
    NENM_034 93 83 79 84.94624
    WWTS_008 93 84 79 84.94624
    PBRI_005 93 83 78 83.87097
    FACE_220 91 80 76 83.51648
    CAMP_018 94 89 78 82.97872
    CAMP_019 47 46 39 82.97872
    CAMP_044 93 83 77 82.79570
    FACE_221 93 83 77 82.79570
    YARN_032 93 78 77 82.79570
    CAMP_041 93 84 76 81.72043
    WWTS_006 93 87 76 81.72043
    WWTS_010 83 74 67 80.72289
    FACE_209 93 84 75 80.64516
    FACE_232 93 82 75 80.64516
    PBRI_009 93 91 75 80.64516
    WWTS_001 93 90 75 80.64516
    WWTS_007 93 83 75 80.64516
    YARN_030 93 81 74 79.56989
    WWTS_003 93 89 73 78.49462
    CAMP_040 93 77 72 77.41935
    FACE_223 93 83 72 77.41935
    YARN_031 93 74 72 77.41935
    CAMP_016 94 87 72 76.59574
    CAMP_043 93 74 71 76.34409
    CAMP_039 93 81 70 75.26882
    FACE_230 93 70 69 74.19355
    CAMP_046 93 72 68 73.11828
    FACE_224 93 80 68 73.11828
    PBRI_003 93 76 68 73.11828
    FACE_226 93 66 62 66.66667
    WWTS_002 93 82 58 62.36559
    FACE_227 93 55 51 54.83871
    NENM_040 64 20 20 31.25000

    Number of retained samples per partner
    Partner Original number of samples No. samples post-QC No. confident samples Percentage of confident samples
    SHAP 186 183 179 96.23656
    SNST 587 571 554 94.37819
    YARN 836 777 764 91.38756
    NENM 1273 1151 1146 90.02357
    PBRI 867 805 764 88.11995
    FACE 2230 1999 1891 84.79821
    CAMP 1721 1569 1454 84.48576
    WWTS 920 859 755 82.06522

    Number of retained samples per UMI plate
    Plate Original number of samples No. samples post-QC No. confident samples Percentage of confident samples
    22 371 364 361 97.30458
    10 372 354 353 94.89247
    8 372 359 353 94.89247
    16 312 303 293 93.91026
    24 372 364 349 93.81720
    19 372 343 343 92.20430
    21 362 343 333 91.98895
    9 311 290 285 91.63987
    15 368 355 337 91.57609
    7 372 348 334 89.78495
    12 372 340 331 88.97849
    20 372 340 327 87.90323
    13 309 303 270 87.37864
    17 376 359 324 86.17021
    18 309 292 265 85.76052
    3 370 335 312 84.32432
    2 353 306 296 83.85269
    6 372 326 307 82.52688
    1 372 330 303 81.45161
    4 372 334 299 80.37634
    14 372 352 297 79.83871
    23 372 305 291 78.22581
    5 372 297 286 76.88172
    11 343 272 258 75.21866

    The plates with low number of reads and retained samples should be examined!

    Samples to examine manually

    Failed negative controls [2%] with contamination other than Bovidae:
    
     CONTROL_NEG_LYSATE_CAMP_016_H12
    CONTROL_NEG_LYSATE_CAMP_017_H12
    CONTROL_NEG_LYSATE_FACE_222_H12
    
    These samples may have insects in them!

    Plate heatmaps - all [partner and UMI plates]